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ABSTRACT 

We investigate the effect of non-Gaussianity on the reconstruction of the 
initial mass field from the Lya forest. We show that the transmitted flux of 
QSO absorption spectra are highly non-Gaussian in terms of the statistics, the 
kurtosis spectrum and scale-scale correlation. These non-Gaussianities can not 
be completely removed by the conventional algorithm of Gaussianization, and 
the scale-scale correlations are largely retained in the mass field recovered by 
the Gaussian mapping. Therefore, the mass power spectrum recovered by the 
conventional algorithm is systematically lower than the initial mass spectrum 
on scales at which the local scale-scale correlation is substantial. To reduce the 
non-Gaussian contamination, we present two methods. The first is to perform 
the Gaussianization scale-by-scale using the discrete wavelet transform (DWT) 
decomposition. We show that the non-Gaussian features of the Lya forest 
basically will no longer exist in the scale-by-scale Gaussianized mass field. The 
second method is to choose a proper orthonormal basis (representation) to 
suppress the effect of the non-Gaussian correlations. In the quasilinear regime of 
cosmic structure formation, the DWT power spectrum is efficient for suppressing 
the non-Gaussian contamination. These two methods significantly improve the 
recovery of the mass power spectrum from the Lya forest. 

Subject headings: cosmology: theory - dark matter - quasars: absorption 
spectrum - large-scale structure of universe 
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1. Introduction 

Since high resolution QSO spectra became available, the transmitted flux in QSO 
spectra, or the Lya forest, offers an unprecedented opportunity to study the large-scale 
structure of the universe and its evolution at redshifts beyond the galaxy redshift catalog 
(e.g. Bi & Davidsen 1997 and references within). A basic goal of this study is to reconstruct 
the initial mass field. Assuming that these objects trace the underlying matter field in some 
way, it seems to be possible to trace the evolution of mass field back in time. Because the 
initial mass field is expected to be Gaussian in many models of the origin of fluctuations, 
reconstructing the initial mass fluctuations is synonymous with recovering the mass power 
spectrum (e.g. Croft et al. 1999.) 

The recovery of power spectrum from the Lya forests relies upon two theoretical 
conjectures. The first is to assume that the transmitted flux of a QSO Lya absorption 
spectrum is a point-to-point tracer of the underlying dark matter distribution. The Lya 
forest has been successfully modeled by the absorption of the ionized intergalactic gas, 
of which the distribution is continuous, and locally determined by the underlying dark 
matter distribution (Bi 1993; Fang et al. 1993; Bi, Ge & Fang 1995; Hernquist et al 1996; 
Bi & Davidsen 1997; Hui, Gnedin & Zhang 1997). Thus, the transmitted flux of a QSO 
absorption spectrum at a given redshift depends only on the mass density of dark matter 
at the position corresponding to the redshift. 

The second assumption is that the initial mass field can be recovered from the fiux of 
QSO spectrum by the Gaussianization algorithm (Weinberg 1992; Croft et al. 1998). With 
this method, the shape of 1-D initial Gaussian density field with an arbitrary normalization 
can be recovered approximately from the observed fiux by a point-to-point Gaussian 
mapping if the relation between fiux and mass density is monotonous, i.e. the higher the 
underlying mass density, the stronger the Lya absorption. The monotony would be a 
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good approximation in the weak nonlinear or quasilinear evolutionary regime of the cosmic 
clustering. 

This paper is trying to study the influence of the non-Gaussianity of the Lya forest 
on the recovery of the initial power spectrum. It is motivated by the recently systematic 
detection of non-Gaussianity of the Lya forests. Despite it is well known that the two-point 
correlation function of the Lyct absorption lines is quite weak, the distribution of these 
lines does show non-Gaussian behavior. For instance, it has been well known about ten 
years ago that the distribution of the nearest neighbor Lya line intervals is different from 
a Poisson process (Duncan, Ostriker, & Bajthk 1989; Liu and Jones 1990; Fang, 1991). 
Recently, the detection of the spectrum of higher order cumulants (Pando & Fang 1998a) 
and the scale-scale correlations (Pando et al. 1998) of the Lya forests imphes systematic 
non-Gaussianity on scales as large as about 10 h~^ Mpc. The abundance of the Lya line 
"clusters" identifled with respect to the richness is also found to be signiflcantly different 
from a Gaussian process (Pando & Fang 1996.) 

According to the philosophy of the Gaussianization reconstruction, all the non-Gaussian 
features of the Lya forests are not initial. It should be removed by the Gaussianization of 
the flux of QSO spectra. The recovered mass fleld should be Gaussian. The algorithm of 
Gaussian mapping is designed for removing the non-Gaussianities of the flux, and recovering 
a Gaussian mass fleld. 

The idea of Gaussianization is exquisite. However, we will show that even though the 
current algorithm of Gaussian mapping does map the distribution of the flux value into a 
Gaussian probability distribution function (PDF), the above-mentioned non-Gaussianities 
of the Lya forests still remain largely in the Gaussianized flux. Namely, the recovered fleld 
is not linear and Gaussian, but contaminated by the non-Gaussian behavior in the Lya 
forest. 
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It has been recognized that the estimation of power spectrum is significantly affected 
by the non-Gaussian behavior, such as the correlation between band-averaged power spectra 
which is essentially the scale-scale correlation (e.g. Meiksin & White 1999). Therefore, 
with the current algorithm of Gaussianization, the recovered power spectrum is distorted 
by the non-Gaussianity of the Lya forests. The question is then raised: how to improve 
the algorithm of Gaussianization in order to recover a mass field exempted from the 
non-Gaussianity of the Lya forest? Or how to suppress the effect of the non-Gaussian 
contamination in estimation of the power spectrum? We try to investigate these problems 
in this paper. 

This paper is organized as follows. In §2, using popular cold dark matter models, 
we present the non-Gaussian features in the transmitted fiux of the Lya forests. In §3, 
we demonstrate that the non-Gaussianity of mass field recovered by the conventional 
Gaussianization algorithm is about the same order as the original non-Gaussianity. Two 
alternatives which yield better Gaussianization are then proposed. In §4, the distortion of 
power spectrum by the non-Gaussianity is shown, and a possible way of suppressing the 
non-Gaussian effect on the power spectrum detection, i.e. properly choosing representation 
of the power spectrum, is suggested. We conclude this paper with a discussion of our 
findings in §5. 

2. The Non-Gaussian features of the Lya forests 

2.1. Samples of the Lya forests 

To investigate the recovery method of mass power spectrum, we generate the simulation 
samples of the Lya forest in the semi-analytic model of the intergalactic medium (IGM) 
developed by H.G. Bi et al. (Bi 1993; Fang et al. 1993; Bi, Ge & Fang 1995; Bi & Davidsen 
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1997.) This model can approximately fit most observed features of the Lya forest, including 
the column density distribution and the number density of the Lya forest lines; the 
distribution of the equivalent widths and their redshift dependence; the clustering and the 
Gunn-Peterson effect. Moreover, in this model, the relations among the dark matter field, 
the flux of the Lya absorption and the power spectrum of reconstructed initial mass fields 
are under control. It would be very useful to reveal the problems of the reconstruction. 

The model was described in detail in the above hsted references. We now give a 
brief account of, especially, the fundamental physics underlying in this model. The basic 
assumption of the model is that the density distribution of the baryonic diffuse matter 
n(x) in the universe is determined by the underlying dark matter density distribution via a 
lognormal relation as 



n(x) = hq exp 



5o(x) - 



2 

where no is the mean number density, and 5o{yi) is a Gaussian random field derived from 
the density contrast S^m of the dark matter by: 



1) 



in the comoving space, or 

in the Fourier space. To take into account the effect of redshift distortion, the pecuhar 
velocity field along the line of sight is also calculated by the simulation model (Bi 1993; 
Fang et al. 1993; Bi& Davidsen 1997.) 

The Gaussian field Sdm is produced in a cold dark matter model. To account for the 
baryonic effect on the transfer function, we adopt the fitting formula for power spectrum 
presented by Eisenstein & Hu (1999). Because the goal of this paper is mainly on examining 
the recovery method of power spectrum, we will not take into account the variants of CDM 
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family, but only the "standard" one, i.e. the flat model (JIq = 1-0) normalized by the 
4-year COBE data, and the F = Qoh is taken to be 0.3, where h denotes the normahzed 
Hubble parameter, and flo is the cosmological density parameter of total mass. This model 
is compatible with the galaxy correlation observed on scales of ~ 10 h~^ Mpc (Efstathiou 
et.al. 1992). The baryonic fraction in the total mass was fixed by the constraint from the 
primordial nucleosynthesis of flf, = 0.0125 h~^ (Walker et.al, 1991). 



The factor xi, in Eq. (2) is the Jeans length of IGM given by 



Xb = 



2tiHo 



(4) 



3/impf2(l + z 

where and /i are the density-average temperature and molecular weight of the IGM 
respectively, and 7 is the ratio of specific heats. The thermal equation of state of IGM is 
assumed to be polytropic, T oc n''~^ with 7 = 4/3. 

The lognormal relation, Eq.(l), has the following property: (1) When fluctuations 
are small, i.e. (n/no — 1) ~ 60, Eq. (1) is just the expected linear evolution of the 
IGM; (2) On small scales as |x — xi| -C x^, Eq. (1) becomes the well-known isothermal 
hydrostatic solution, which describes highly clumped structures such as intracluster gas, 
n oc exp{— fj^nipipDM /ikT) , where ipDM is the dark matter potential (Sarazin & Bahcall 
1977). 

The absorption optical depth at observed wavelength A is 

t{X)^ f\(^^)nHi{t)dt, (5) 

where zq — (A/Aq) — 1, to denotes the present time, tqso is the time corresponding to the 
redshift Zggo of the QSO, and so does for the relation between t and z; a is the absorption 
cross section at the Lya transition, and Aq = 12 16 A represents the Lya wavelength. The 
density of the neutral hydrogen atoms, uhi, can be found from n by the cosmic abundance 
of hydrogen, and photoionization equilibrium (Bi, Ge & Fang 1995.) 
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Obviously in this model, the relation between the transmitted flux, F{X) — e""^, 
and n(x) or 6o{x.) is basically local. The non-locahty is only caused by the width of the 
absorption cross section a, and the peculiar velocity of the neutral hydrogen. Therefore, 
F is approximately a point-to-point tracer of mass fluctuation 5o(x). Moreover, the F is 
monotonically related to Sq, and then, to the density contrast 5dm on scales larger than x^- 

In this paper, we produce the simulation samples in the redshift range z — 2.066 ~ 2.436 
with 2^^ pixels. The corresponding simulation size in the CDM model is 189.84 h~^Mpc in 
comoving space which is long enough to incorporate most of the fluctuation power. The 
selection of this redshift range is to compare the simulation with the Keck spectrum of 
HS1700+64. The spectrum of HS1700+64 ranges from 3723.012Ato 5523.554Awith the 
resolution 3 kms~^, or totally 55882 pixels in which the flrst 2^^ pixels are chosen here. 
These data have been used for testing the model considered in Bi & Davidsen (1997). 



2.2. The skewness and kurtosis spectra of the transmitted flux 

If appropriate parameters of the intergalactic UV background are adopted, the 
lognormal IGM model described in §2.1 could explain successfully many observed properties 
of the Lya forest and their evolution from redshift 2 to 4. Now we show that it also works 
against tests of the non-Gaussian features. 

We use the wavelet transform to analyze the non-Gaussian behavior of the transmitted 
flux F. As a 1-D fleld, the flux F{X) in the wavelength range of L = Xmax — ^min is subject 
to a discrete wavelet transform (DWT) as 

oo 2^-1 

F-F + m:^j,i^j,iW (6) 

j=0 1=0 

where '4>j,i{x), j = 0, 1, ...,/ = 0... 2-' — 1, is an orthogonal and complete set of the DWT basis 
(the details of the DWT, see e.g. Fang & Thews 1998). The wavelet basis ipj,i{x) is localized 
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both in the physical space and the Fourier (scale) space. The function ipj^iix) is centered at 
position of the physical space, and at wavenumber 2% x of the Fourier space. 

Therefore, the wavelet function coefficients (WFCs), e^y, have two subscripts j and /. They 
describe the fluctuation of the flux on scale at position IL/2K To be more specific, we 
will use the Daubechies 4 wavelet in this paper, although all conclusions are not affected by 
this particular choice as long as a compactly supported wavelet basis is used. 

The WFC, Ij^i, is computed by the inner product of 



Since the DWT bases are complete, the WFCs contain all information of the fiux. 

Note that the ipj,i{x) are orthogonal with respect to the position index I, and therefore, 
for an ergodic field, the 2^ WFCs at a given j, i.e. e^y, I — 0, 1...2^ — 1, can be treated as 
independent measures of the fiux field. The 2^ WFCs, e^y, from one reahzation of -F(A) can 
be employed as a statistical ensemble. In other words, when the fair sample hypothesis 
holds (Peebles 1980), an ensemble average can be estimated equivalently by averaging over 
Z, i.e. {ij^i) ~ (l/S'') J2f=o^ ^j,h where (...) denotes the ensemble average. The distribution of 
ij^i represents approximately the one-point distribution of the WFCs at a given scale j. 

The non-Gaussianity of the fiux -F(A) can be directly measured by the deviation of 
the one-point distribution from a Gaussian distribution. For this purpose, we calculate the 
cumulant moments defined by 



(7) 




(8) 




(11) 



(10) 



(9) 



-10- 



where 

M" = . 
2J 



m-^t^jJ: ihi - hiT- (12) 



=0 



The second order cumulant moment gives the DWT power spectrum (§4) (Pando 
& Fang 1998). For Gaussian fields all the cumulant moments higher than order 2 are 
zero. Thus one can measure the non-Gaussianity by with n > 2. We call the DWT 
spectrum of n-th cumulant. The cumulant measures /| and Ij are related to the well known 
skewness and kurtosis, respectively, defined by 

^3 ^ J^.l'r (14) 

Using the skewness and kurtosis spectra as statistical indicators, a significant 
non-Gaussian behavior has been found in the distribution of Lya forest lines (Pando & 
Fang 1998a). The skewness and kurtosis spectra of the transmitted flux in 100 simulated 
samples are shown in Figs. 1 and 2, respectively. To assess the statistical significance, the 
95% confidence range from 100 reahzations of Gaussian noise are also displayed in these 
figures. Clearly, the kurtosis spectrum of simulated F shows difference from the Gaussian 
noise spectra on the scales j > 8 (or < 1.5 h~^ Mpc) with 95% confidence. The skewness 
spectrum does not show significant difference from the Gaussian noise till j — 11 (~ 100 
h~^ kpc). These results are qualitatively consistent with that for the observed forest fine 
distributions. In addition, the skewness and kurtosis spectra of the flux of HS 1700+64 are 
also presented in Figs. 1 and 2. Obviously, the CDM model is in excellent agreement with 
the observation. 
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2.3. The scale-scale correlations of the transmitted flux 

The scale-scale correlations measure the correlations between the fluctuations on 
different scales (Pando et al. 1998, Pando, Valls-Gabaud, & Fang 1998, Feng, Deng & Fang 
2000). This non-Gaussianity is independent from the higher order cumulants (§2.2), which 
are only of j dependent. A simplest measure of the scale-scale correlation is given by 



where p is an even integer, and [ ]'s denote the integer part of the quantity. Because 

— L2//2^+^, the position I at scale j is the same as the positions 21 and 2Z -|- 1 at 
scale J + 1. Therefore, CJ'^ measures the correlation between fluctuations on scale j and 
J -|- 1 at the same physical point. For Gaussian flelds, Cj'^ = 1. Cj'^ > 1 corresponds to the 
positive scale-scale correlation, and CJ'^ < 1 to the negative case. One variant of the above 
deflnition is 

9(j+l) ~f) -f) 

rp,p - ^'=° S;[V2]+a; 

This statistics is for measuring the correlations between fluctuations on scales j and 
j + Ij but at different positions, i.e. the ffuctuation at scale j is displaced from the j + 1 
ffuctuation by a distance . 

The scale-scale correlation Cj' calculated from the simulated transmitted ffux and 

2 2 

HS1700-I-64 are shown in Fig. 3. Clearly, the values of Cj' are signiflcantly larger than 
unity and well above the Gaussian noise spectra on all the scales j > 7. This result is also 
qualitatively in agreement with the scale-scale correlation of the Lya forests (Pando et al. 
1998.). Figure 3 also indicates that the model of §2.1 is still in a good shape of fltting the 
observed non-Gaussian correlation. 

Similar to Eq.(15) for the correlation between scales \j — j'\ = 1, one may deflne, 
in principle, the correlation between two arbitrary scales with \j — j'\ > 1. However, 
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for hierarchical clustering the scale-scale correlation is quantified mainly by \j — j'\ — 1. 
Therefore, we will not calculate the scale-scale correlations for \j — j'\ > 1. 

3. The Non-Gaussian features of the Gaussian-recovered mass fields 

3.1. Non-Gaussianity after Gaussianization 

The cosmological reconstruction is to extract the power spectrum of the initial linear 
mass fluctuations from the observed distribution of various tracers of the evolved density 
field. The algorithm of Gaussianization was designed for recovering the primordial density 
fluctuations from an observed galaxy distribution (Weinberg 1992). This method has been 
recently applied to recovering the linear density field and its power spectrum from the 
observed transmitted fiux F of QSO absorption spectra (Croft et al 1998, 1999). 

The key step of the Gaussianization algorithm is a pixel-to-pixel mapping from an 
observed fiux F into the density contrast 5. The probability distribution function (PDF) 
of the observed transmitted fiux F is generally non-Gaussian, while the PDF of the initial 
density contrasts 6 = (n/no) — 1 is assumed to be Gaussian in large variety of galaxy 
formation models. The relation between F — exp(— r) and 5 is monotonic, i.e. high initial 
density 5 pixels evolved into high r pixels, low initial density pixels into low r pixels. 
Thus, using the observed F, one can sort out the total N pixels by the amount of F in the 
ascending order: the pixel with lowest F is labeled by 1st, the next higher F pixel is labeled 
by 2nd, and so on. For the n-th pixel, we then assign the density contrast 5, which is given 
by the solution of the equation (27r)~-^/^ /f^ exp(— x^/2)dx = n/N. Thus, the Gaussian 
mapping produces a mass field with the same rank order as the fiux but with a Gaussian 
PDF of (5(x). The overall amplitude of the recovered power spectrum should be determined 
by a separate procedure. For instance, we may set up the initial condition by using the 
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recovered spectrum, evolve the simulation to the observed redshift and then normalize the 
spectrum by requiring that the simulation reproduces the observed power spectrum of the 
transmitted QSO flux. This amplitude normalization is model-dependent. 

We apply the Gaussianization to 100 simulation samples of the QSO transmitted flux, 
and measure the skewness and kurtosis spectrum as well as the scale-scale correlation. The 
results are displayed in Figs. 4-6. For comparison, the non-Gaussian spectra of the flux in 
Figs. 1-3. are also plotted correspondingly. Figs. 4-6 show that the Gaussianized flux 
still largely exhibits non-Gaussian features. Especially, the scale-scale correlations of the 
Gaussianized fleld is as strong as the pre-Gaussianized flux on scales j > 10. That is, the 
recovered density fleld is seriously contaminated by the non-Gaussianities in the original 
flux. 

3.2. The efficiency of the conventional Gaussianization 

The reason for the lower efficiency of the conventional Gaussian mapping (§3.1) 
is simple. The initial Gaussian random mass fleld is assumed to be a superposition 
of independent modes, of which the PDFs are Gaussian. For instance, in the Fourier 
representation, all Fourier modes of a Gaussian mass fleld are Gaussian, i.e. they have 
Gaussian PDF of the amplitudes and randomized phases. The conventional algorithm 
considered only the Gaussianization of one variable, 5. It does not guarantee the 
Gaussianization of the amplitudes and phases of all relevant modes. In other words, the 
Gaussian mapping algorithm will work perfectly for a system with one stochastic variable, 
but not so for a fleld. 

Alternatively, this problem can also be seen via the DWT representation. Using eq.(l), 
any 1-D mass fleld given by density contrast S{x) {S — 0) can be decomposed with respect 
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to a DWT basis as 

j=0 1=0 

where the superscript M means mass. Equation (17) represents a hnear superposition 
of modes ^pj^i. As has been pointed out in §2.2, for a given j, the 2^ WFCs form a 
statistical ensemble. The distribution of the 2^ WFCs gives the one-point distribution of 
the amphtude of mode at the scale j. For the initial Gaussian mass field, these one-point 
distributions should be Gaussian. Obviously, the Gaussian PDF of 6 does not imply that 
the one-point distributions of the WFCs for all j are Gaussian (the central limit theorem). 
The amplitude S can only play the role as one variable of the field. 

Moreover, even when the one-point distributions of 2^ WFCs at all j are Gaussianizcd, 
the mass field could still be non-Gaussian. For instance, suppose the one-point distribution 
of the 2^ WFCs, e^, on scale j, is Gaussian. If the WFCs on scale j -|- 1 is given by 

where a and b are arbitrary constant, the one-point distribution of the 2^~^^ WFCs e^^-^ i 
is also Gaussian. However, Eq.(18) leads to a strong correlation between / and e^. 
This is an example of the scale-scale correlation, i.e. the scale j + 1 fluctuations are always 
proportional to those on the scale j at the same position. Moreover, this correlation can 
not be eliminated by the Gaussianization of e^. The Gaussian mapping changes all the 
WFCs at a given position (pixel) by a same amplifying or reducing factor, and therefore, 
the local relations Eq.(18) remains. 

The scale-scale correlations only depend upon the statistical behavior of the fluctuation 
distribution with respect to the index j. Therefore, a Gaussian fleld requires the 
uncorrelation between the distributions of WFCs with different j. This uncorrelation 
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corresponds to decorrelating the band average Fourier modes, which will be discussed in 
detail in §4. 



3.3. Algorithms of scale-by-scale Gaussianization 

Based on the considerations in the last section, we may design an algorithm which is 
capable of reducing the contamination of the non-Gaussianity, and produce fields with less 
non-Gaussianity. 

The new method is based on the scale-by-scale decomposition of flux and mass field. 
Prom Eq.(6), we have 

CO 2J'-1 

F = F^ + J2Y. ~^3',i^3',u (19) 

j'=j 1=0 

and 

^'=^+E E ~^3',li^3',l- (20) 
ji=0 1=0 

is actually a smoothed F by a filter on the scale j. There is a recursion relation in 
given by 

= + E^~.,^V',> (21) 

1=0 

Namely, flux can be reconstructed from flux F^ and 2^ WFCs Ij^i at the scale j. 
Similarly, for a mass distribution, we have 

oo 2J'-1 

<^ = '^^^ + E E ~^ni^3',u (22) 

j'=j 1=0 

^'■^E E'e~SV'/,^ (23) 

j'=0 1=0 

and 

S'^' = (24) 

1=0 
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Since the relations between F and 5 are local and monotonic, the smoothed flux 
depends only on the smoothed mass field 5^"^^, and one can perform a local and monotonic 
mapping between and 5^^^. Thus, we can implement the reconstruction of the mass 
field 5^^^ from F^^^ by a scale-by-scale Gaussianization algorithm (hereafter referred to as 
algorithm I): 

1. Supposing the reconstruction down to the scale j has been done, i.e. the 5^ is known 
already; 

2. Calculating the WFCs of the flux F on the scale j; 

3. Making the Gaussian mapping of the 2^ WFGs ej,/, and assigning the Gaussianized 
result, to the 2^ pixels according to the rank order. The distribution of Sj^i is Gaussian 
with zero mean and variance one. 

4. Finding the 2^ WFCs of mass field by 

6^ = US,,. (25) 

where the parameter is a normalization factor to be determined. The one-point 
distribution of the WFCs of mass field at the scale j, e^, is then Gaussianized. 

5. Reconstructing the mass field 5^^^ on scale j -|- 1 by the recursion relation Eq.(24) 

6. To determine the parameter z/, we require that the DWT power spectrum of the 
flux simulated from 5^^^ reproduces the observed flux F^^^. We have then 5^^^. The 
reconstruction of mass field on the scale j -|- 1 is done. 

Repeating the steps 1 to 6, one can reconstruct the mass field on scales from large 
to small until the scale of the resolution of the flux F, or the scale on which the relation 
between F and 5 is no longer local. 

Figure 7 illustrates the transmitted flux, the initial density field and the recovered 
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density field by algorithm I. The recovered ID density field is in excellent agreement with 
the original density field scale-by-scale. The non-Gaussianities of the recovered fields by 
algorithm I are shown in Figs. 4-6. The skewness and kurtosis spectra exhibit almost 
nothing but Gaussianity. The scale-scale correlation is also significantly reduced. 

The Gaussianization algorithm I is conceptually clear. However, it needs to determine 
the normalization factor u at each scale. Therefore, it is rather cumbersome to do the 
numerical calculation. Moreover, there is still somewhat residual scale-scale correlation in 
the recovered mass field. In fact, algorithm I does ensure the Gaussian PDF of e^, but it 
is unable to remove all the correlation between different modes, just as the simple example 
[Eq.(18)] demonstrated in §3.2. 

To avoid the multiple normalizations and keep the virtues of scale-by-scale 
Gaussianization, we design an alternative algorithm as follows (hereafter referred to as 
algorithm II): 

1. Using the conventional Gaussianization (§3.1) to reconstruct the mass field, i.e. 
to perform Gaussian mapping of the density contract 5 and normalize the mass field by 
requiring that the evolved simulations reproduce the power spectrum of the observed flux. 

2. Calculating the WFCs Ij^i of the recovered mass field 5^ on each scale j. 

3. Similar to the step 3 of algorithm I, making the Gaussianization of Ij^i for each scale 
j to produce unnormahzed WFCs 

4. Normalizing the WFCs £^ on scale j by requiring that the variance of i.e., the 
2nd cumulant moment /| [Eq.(7)], is the same as those for the WFCs e^y. 

5. For each scale j, randomizing the spatial sequence of the Gaussianized WFCs 
i.e., making a random permutation among the index I. 
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6. Using these WFCs one can reconstruct the mass density field by Eq.(24) till the 
scale given by the resolution of the flux. 

Algorithm II is still scale-by-scale in nature. However, the normalization is one only 
once for the recovered 5. The step 4 ensure that the normalization is unchanged after the 
step 3, which eliminates the non-Gaussianities of the skewness and kurtosis spectra. Step 
5 is for eliminating the residual scale-scale correlations by a randomization of the spatial 
index / of ij^i. Namely, it changes only the position of e^y, but not the values. Therefore, 
it is similar to a randomization of phases of the Fourier modes, and will not change the 
normalization of the amplitude and power spectrum of the fields. 

Figs. 4-6 show that the Gaussianized field by the algorithm II contains almost none 
of the non-Gaussian features considered. However, it should be pointed out that the field 
given by algorithm II is no longer a point-to-point reconstruction due to the randomization 
of I. Namely, the recovered field will not be point-to-point the same as the field shown 
in Fig. 7. Nonetheless, since the purpose of the Gaussianization is to recover the power 
spectrum of the primordial density fiuctuations, algorithm II is a valuable approach. As 
will be shown in next section, the algorithm II gives more unbiased estimation of power 
spectrum by the standard FFT technique. 

In order to illustrate the effect of the peculiar velocities on the Gaussianization, each of 
Figs. 4-6 contains two panels: one employed the simulation samples including the effects 
of peculiar velocities, and the other did not. All the figures show that for the algorithm I, 
the effect of peculiar velocities is significant only on small scales j > 9, or A; > 10 h Mpc~^; 
while for algorithm II, the effect of pecuhar velocities appears on smaller scales. Therefore, 
our proposed scale-by-scale Gaussianization methods would not be affected by the peculiar 
velocities as least up to the scale j=9. 
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4. Recovery of mass power spectrum from the transmitted QSO flux 

4.1. The power spectrum in different representations 

As a preparation for measuring the non-Gaussian effects on power spectrum recovery, 
we first discuss the representation of power spectrum. Principally, a random field can be 
described by any complete orthonormal basis (representation). Although the default usage 
of power spectrum is defined on the Fourier basis, one can define the power spectrum with 
respect to different representation. This is due to the fact that the Parseval's theorem holds 
for any complete and orthonormal basis decomposition. 

In the Fourier representation, the power spectrum of a 1-D density field 5{x) is given 



where 6n is the Fourier transform of 6{x). |(5„p measures the power of mode n because of 
Parseval's theorem 



Similarly, we have the Parseval's theorem for the DWT transform given by (Fang & 
Thews 1998, Pando & Fang, 1998b) 



by 



P{n) = \S, 



(26) 




(27) 




(28) 



(For simplicity, we ignore the superscript M on e^y). Therefore, the term i 
power of mode (j, /), and the total power on the scale j is 



describes the 




1=0 



|2 



(29) 



which defines the DWT power spectra Pj. 
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Generally, the second order correlation functions of 5„ or e^y can be converted from 
each other by 

+00 

= E (LWrAn'Mii^) (30) 



n.n '= — 00 



+00 2^ -1 2^ -1 

iW = E E E (6,,6,',,)4K^)#,r(^') (31) 

j,j'=0 1=0 l'=0 

where ipj,i{n) is the Fourier transform of ipj^i{x). For an homogeneous random field, 
(SnS*,) = {\Sn\^)Sn,n', wc havc then 

+00 

(^y = E {K\'MAn)\' (32) 

n=— 00 

or 

+00 

Pj- E ^(r^)|V'(r^/2^')r (33) 

n=— 00 

where ip{n/2^) is the Fourier transform of the generating wavelet '(/^{x) (Pando & Fang 
1998b). In Eq. (33) the function \t/j(n/2^)\'^ plays the role of window function in the 
wavenumber n space. The function ip{n) is localized in n-space. For the Daubcchics 4 
wavelet, |'0(ri)| is peaked at n = irip with the width of An^. Therefore, the DWT spectrum 
Pj gives an estimator of the "band averaged" Fourier power spectrum within the band 
centered at 

log n = (log 2) j + log Up, (34) 

with the band width, 

Alogn = Aup/up. (35) 

As the mean of the WFCs e^y over / is zero, so the second cumulant moment /| is 
related to DWT spectrum Pj by Ij — {L/2^)Pj, we will use variance /| as the estimator of 
DWT power spectrum instead of Pj. 

For a Gaussian field, the statistical behavior are completely determined by the second 
order statistics of the Gaussian variables 5n or ij^i. Theoretically, the power spectrum 
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estimators P{n) and Pj present the equivalent description. However, as will be shown 
below, once non-Gaussianity appears, these estimators will no longer be equivalent. 

4.2. Effect of non-Gaussianity on the recovery of mass power spectrum 

Using the 100 realizations of the mass density fields recovered by the conventional 
algorithm, and algorithm I and II of the Gaussianization, we calculated the power spectra 
by the standard FFT technique. To reveal the effect of non-Gaussianity on the power 
spectrum estimation, we do not include the effects of instrumental noise and continuum 
fitting in the synthetic spectra. The dominant sources of error in estimation of power 
spectrum would be the cosmic variance and the non-Gaussian effects. 

Figure. 8 compares the power spectra obtained by different Gaussianization methods. 
The 1-D linear power spectrum of Eq.(3) is also shown by sohd fine. These power spectra 
are normalized to the present. In general, the recovered power spectrum can match the 
shape of the linear theory over a wide range of wavelength, especially on larger wavelength. 
Yet, the recovered spectra show somewhat systematic departure from the initial mass 
power spectrum with the increase of wavenumbers. For the conventional Gaussianization, 
the recovered power spectrum falls below the initial power spectrum on scales of j > 8 or 
/c ~ 1.5 h~^ Mpc. The power spectrum recovered by the algorithm I is better than the 
conventional Gaussianization, and the recovery by algorithm II gives the best one, which is 
almost the same as the initial power spectrum on all scales. 

Comparing Fig. 8 with Figs. 4 - 6, we can see that the scales on which the depression 
of the recovered power spectrum appears is always the same as the scale on which the 
scale-scale correlations become significant. Moreover, the less the scale-scale correlation 
(Fig. 6), the less the depression. This indicates that the recovered spectrum is substantially 
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affected by the non-Gaussianities, especially, the scale-scale correlations. Actually, this 
effect has already been recognized by Meiksin & White (1998) in analyzing N-body 
simulation samples. Namely, the goodness of a power spectrum estimation is significantly 
dependent on the correlation between the Fourier power spectra averaged at different scale 
bands. 

Back to the definition of scale-scale correlation Eq. (15), and recall that the average 
over an ensemble is equivalent to the spatial average taken over one realization, Eq.(15) can 

be rewritten as 



Hence, the scale-scale correlation is actually a measure of the correlation between the 
Fourier power spectra averaged at different scale bands. This can also be seen from Eq.(31) 
that the Fourier power spectrum around n depends on the fluctuations on different j, and 
therefore, their non-Gaussian correlations. 

Because the algorithm II is most effective for ehminating the scale-scale correlations, 
the resulting power spectrum shows the best recovery of the linear model. 

4.3. Suppression of non-Gaussian correlations by representation 

In the DWT representation, the power spectrum (29) docs not depend on modes at the 
scales different from j, and therefore the scale-scale correlation will not affect the estimation 
of Pj. One can expect that the DWT spectrum estimator, Pj, will give a better recovery of 
the initial power spectrum. 

Fig. 9 displays the DWT power spectrum Pj for mass flelds given by the different 
Gaussianization methods. The DWT power spectrum in the hnear CDM model is also 
shown by solid line, which is calculated from the Fourier linear power spectrum by Eq. (33) 





(36) 
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in the continuous hmit of n. This figure indicates that even for the mass field recovered 
from conventional Gaussianization, the DWT power spectrum is in good agreement with 
the initial DWT mass power spectrum up to the scale j = 9. This is already much better 
than its counterpart in the Fourier representation, for which the power spectrum shows 
significant difference from the linear spectrum on scale j ~ 8. For the algorithm I and II, 
the DWT power spectrum also gives the good results. In addition, the errors due to the 
cosmic variance and normalization in the DWT spectrum are manifestly smaller than that 
of Fourier spectrum. 

The DWT power spectrum Pj (29) is given by the summation of |ej,;|^ over / at a 
given scale. Therefore, the non-Gaussian effect on estimation of Pj mainly arises from the 
correlation between the WFCs at different I, which can be measured by 

QfXi Sivcs the correlation between the density fluctuations on the same scale j at different 
places / and I + AL 

Fig. 10 displays the correlations Qf/^i with Al = 1. It shows that this non-Gaussianity 
can be ignored till j — 10. On the other hand, the scale-scale correlation C|'^ had been 
significant on j = 8 (Fig. 3). In result, the Fourier power spectrum is contaminated by the 
non-Gaussianity on j > 8, while the DWT power spectrum is less biased till j — 9. 

In a word, the non-Gaussian correlations are effectively suppressed in the DWT 
representation. The DWT spectrum estimator gives a better recovery of the initial power 
spectrum. 



5. Conclusions 



In the cosmological reconstruction of initial Gaussian mass power spectrum, a serious 
obstacle is the non-Gaussianity of the evolved field. The quahty of the recovery of the 
power spectrum is affected by the non-Gaussian correlations. The precision to which the 
mass power spectrum could be measured relies on how to treat the non-Gaussianity of the 
evolved mass field. 

In the quasi-nonlinear regime of cosmic gravitational clustering (like that traced by the 
Lya forests) , the dynamical evolution is characterized by the power transfer from large scale 
perturbations to small ones (Suto & Sasaki 1991). This is the mode-mode coupling which 
produces the scale-scale correlations. Using perturbation theory in the DWT representation, 
one can further show that the mode-mode coupling at the same position (local coupling) is 
much stronger than coupling between modes at different positions (non-local) (Pando, Feng 
& Fang 1999). On the other hand, the power spectrum in the quasi-nonlinear regime does 
not significantly differ from the linear regime. Therefore, the algorithm for recovering the 
initial mass power spectrum from the Lya forests should be designed to eliminate the local 
scale-scale correlations of the evolved mass field. 

Using simulations in semi- analytical model of the Lya forests, we show that the 
conventional algorithm of the Gaussianization is not enough to recover a Gaussian field. 
The local scale-scale correlations of the Lya forests are still retained in the Gaussianized 
mass field. Based on the DWT scale-space decomposition, we proposed two algorithms of 
the Gaussianization, which are effective to eliminate the non-Gaussian features. 

We showed that representation selection is important for the recovery of the power 
spectrum. A representation, which can effectively suppress the contamination of local 
scale-scale correlations, would be good for extracting the initial linear spectrum. We 
compared the Fourier and DWT representations for the estimation of power spectrum. 
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We demonstrated that, at least in the quasi-nonhnear regime, the DWT power spectrum 
estimator is better, because it can avoid the major contamination, the local scale-scale 
correlations. We also showed that the peculiar velocities of gas will not affect on the DWT 
power spectrum recover up to, at least, the scale j — 9. 
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Fig. 1. — The skewness spectrum of the Lya forests. The 95% confidence ranges of the 
skewness spectrum of the simulated samples and Gaussian noise are shown by the bars and 
gray band, respectively. The skewness spectrum of the Keck data of HS1700+64 is shown 
by squares and solid line. The physical scale related to j is 189.8 x h~^ Mpc in the CDM 
model. 
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Fig. 2. — The kurtosis spectrum of the Lja forests. The 95% confidence ranges of the 
kurtosis spectrum of the simulated samples and Gaussian noise are shown by the bars and 
grey band, respectively. The kurtosis spectrum of the Keck data of HS1700+64 is shown by 
squares and solid line. The physical scale related to j is 189.8 x h^^ Mpc in the CDM 
model. 
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Fig. 3. — The scale-scale correlation of the Lya forests. The 95% confidence ranges of the 
scale-scale correlation, Cj' , for the simulated samples and Gaussian noise are shown by the 
bars and grey band, respectively. The scale-scale correlation of the Keck data of HS 1700-1-64 
is shown by squares and solid line. The physical scale related to j is 189.8 x h~^ Mpc in 
CDM model. 
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Fig. 4. — The skewness spectrum of mass field recovered by the three Gaussianization 
methods. The upper panel is for transmitted flux samples free from the effect of peculiar 
velocity, and the lower panel is for samples affected by the peculiar velocity. The 95% 
confidence ranges of the skewness spectrum are shown by the symbols indicated in figure. 
The gray band is the 95% confidence ranges for the Gaussian noise samples. 
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Fig. 5. — The kurtosis spectrum of mass field recovered by the three Gaussianization 
methods. The upper panel is for transmitted flux samples free from the effect of peculiar 
velocity, and the lower panel is for samples affected by the peculiar velocity. The 95% 
confidence ranges of the skewness spectrum are shown by the symbols indicated in figure. 
The gray band is the 95% confidence ranges for the Gaussian noise samples. 
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Fig. 6. — The scale-scale correlation of mass field recovered by the three Gaussianization 
methods. The upper panel is for transmitted flux samples free from the effect of peculiar 
velocity, and the lower panel is for samples affected by the pecuhar velocity. The 95% 
confidence ranges of the skewness spectrum are shown by the symbols indicated in figure. 

The gray band is the 95% confidence ranges for the Gaussian noise samples. 
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Fig. 7. — An example of the density field (left panel) and flux (right panel) reconstructed 
by the scale-by-scale Gaussianization (left panel). The sohd hnes are for the reconstructed 
density field and fiux. The dot lines are for the original density field and fiux. Actually, the 
solid line and dot line are coincident in most places. 
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Fig. 8. — The Fourier power spectrum of the mass fields recovered by the conventional 
Gaussianization (left panel), algorithm 1 (central panel) and algorithm II(right panel). The 
error bars are given by la deviation calculated for the 100 realizations. The Fourier power 
spectrum of the linear CDM mode smoothed by Jeans filter [Eq.(3)] is shown by solid fine. 



-36- 




OOL OL L L"0 LO'O 

Fig. 9. — The DWT power spectra of the mass fields recovered by the conventional 
Gaussianization (left panel), algorithm I (central panel) and algorithm II(right panel). The 
error bars are given by la deviation calculated for the 100 reahzations. The DWT power 
spectrum of the linear CDM mode smoothed by Jeans filter [Eq.(3)] is shown by solid line. 
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Fig. 10. — The non-local correlations Qf^i with AZ = 1 of the recovered density field by the 
three Gaussianization methods. The symbols displayed in the figure have the same meanings 
as Fig. 6. 



